8 research outputs found
Visual object detection from lifelogs using visual non-lifelog data
Limited by the challenge of insufficient training data, research into lifelog analysis, especially visual lifelogging, has not progressed as fast as expected. To advance research on object detection on visual lifelogs, this thesis builds a deep learning model to enhance visual lifelogs by utilizing other sources of visual (non-lifelog) data which is more readily available.
By theoretical analysis and empirical validation, the first step of the thesis identifies the close connection and relation between lifelog images and non-lifelog images. Following that, the second phase employs a domain-adversarial convolutional neural network to trans- fer knowledge from the domain of visual non-lifelog data to the domain of visual lifelogs. In the end, the third section of this work considers the task of visual object detection of lifelog, which could be easily extended to other related lifelog tasks.
One intended outcome of the study, on a theoretical level of lifelog research, is to iden- tify the relationship between visual non-lifelog data and visual lifelog data from the perspective of computer vision. On a practical point of view, a second intended outcome of the research is to demonstrate how to apply domain adaptation to enhance learning on visual lifelogs by transferring knowledge from visual non-lifelogs. Specifically, the thesis utilizes variants of convolutional neural networks. Furthermore, a third intended outcome contributes to the release of the corresponding visual non-lifelog dataset which corresponds to an existing visual lifelog one. Finally, another output from this research is the suggestion that visual object detection from lifelogs could be seamlessly used in other tasks on visual lifelogging
Transfer nonnegative matrix factorization for image representation
Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. However, when labeled and unlabeled images are sampled from different distributions, they may be quantized into different basis vector space and represented in different coding vector space, which may lead to low representation fidelity. In this paper, we investigate how to extend NMF to cross-domain scenario. We accomplish this goal through TNMF - a novel semi-supervised transfer learning approach. Specifically, we aim to minimize the distribution divergence between labeled and unlabeled images, and incorporate this criterion into the objective function of NMF to construct new robust representations. Experiments show that TNMF outperforms state-of-the-art methods on real dataset
Negative faceblurring: a privacy-by-design approach to visual lifelogging with Google Glass
Wearable devices such as Google Glass are receiving increasing attention and look set to become part of our technical landscape over the next few years. At the same time, lifelogging is a topic that is growing in popularity with a host of new devices on the market that visually capture life experience in an automated manner. We describe a visual lifelogging solution for Google Glass that is designed to capture life experience in rich visual detail, yet maintain the privacy of unknown bystanders
Real-time behavioural analysis using google glass
Lifelogging is a form of pervasive computing that represents a phenomenon whereby people can digitally record their own daily lives in varying amounts of detail, for a variety of purposes. Lifelogging offers huge potential for supporting behaviour change because it can capture the totality of life experience and provide heretofore unknown levels of insight into the real-world activities of the lifelogger. In this paper we present a real-time curated lifelogging prototype that can support real-time behavioural analysis by supporting immediate feedback and intervention to the lifelogger
Bridged Transformer for Vision and Point Cloud 3D Object Detection
3D object detection is a crucial research topic in computer vision, which
usually uses 3D point clouds as input in conventional setups. Recently, there
is a trend of leveraging multiple sources of input data, such as complementing
the 3D point cloud with 2D images that often have richer color and fewer
noises. However, due to the heterogeneous geometrics of the 2D and 3D
representations, it prevents us from applying off-the-shelf neural networks to
achieve multimodal fusion. To that end, we propose Bridged Transformer (BrT),
an end-to-end architecture for 3D object detection. BrT is simple and
effective, which learns to identify 3D and 2D object bounding boxes from both
points and image patches. A key element of BrT lies in the utilization of
object queries for bridging 3D and 2D spaces, which unifies different sources
of data representations in Transformer. We adopt a form of feature aggregation
realized by point-to-patch projections which further strengthen the
correlations between images and points. Moreover, BrT works seamlessly for
fusing the point cloud with multi-view images. We experimentally show that BrT
surpasses state-of-the-art methods on SUN RGB-D and ScanNetV2 datasets.Comment: CVPR 202
Visual object detection from lifelogs using visual non-lifelog data
Limited by the challenge of insufficient training data, research into lifelog analysis, especially visual lifelogging, has not progressed as fast as expected. To advance research on object detection on visual lifelogs, this thesis builds a deep learning model to enhance visual lifelogs by utilizing other sources of visual (non-lifelog) data which is more readily available.
By theoretical analysis and empirical validation, the first step of the thesis identifies the close connection and relation between lifelog images and non-lifelog images. Following that, the second phase employs a domain-adversarial convolutional neural network to trans- fer knowledge from the domain of visual non-lifelog data to the domain of visual lifelogs. In the end, the third section of this work considers the task of visual object detection of lifelog, which could be easily extended to other related lifelog tasks.
One intended outcome of the study, on a theoretical level of lifelog research, is to iden- tify the relationship between visual non-lifelog data and visual lifelog data from the perspective of computer vision. On a practical point of view, a second intended outcome of the research is to demonstrate how to apply domain adaptation to enhance learning on visual lifelogs by transferring knowledge from visual non-lifelogs. Specifically, the thesis utilizes variants of convolutional neural networks. Furthermore, a third intended outcome contributes to the release of the corresponding visual non-lifelog dataset which corresponds to an existing visual lifelog one. Finally, another output from this research is the suggestion that visual object detection from lifelogs could be seamlessly used in other tasks on visual lifelogging
Demographic attributes prediction using extreme learning machine
Demographic attributes prediction is fundamental and important in many
applications in real world, such as: recommendation, personalized search and behavior targeting. Although a variety of subjects are involved with demographic attributes
prediction, e.g. there are requirements to recognize and predict demography from
psychology, but the traditional approach is dynamic modeling on specified field and
distinctive datasets. However, dynamic modeling takes researchers a lot of time and
energy, even if it is done, no one has an idea how good or how bad it is. To tackle
the problems mentioned above, a framework is proposed in this chapter to predict
using classifiers as core part, which consists of three main components: data processing, predicting using classifiers and prediction adjustments. The component of data
processing performs to clean and format data. The first step is extracting relatively
independent data from complicated original dataset. In the next step, the extracted
data goes through different paths based on their types. And at the last step, all the
data will be transformed into a demographic attributes matrix. To fulfill prediction,
the demographic attributes matrix is taken as the input of classifiers, and the testing
dataset comes from the same matrix as well. Classifiers in the experiments includes
conventional state-of-the-art ones and Extreme Learning Machine, a new outstanding
classifier. From the results of experiments based on two unique datasets, it is concluded ELM outperforms others. In the stage of prediction adjustments, two kinds
of adjustments strategies are proposed corresponding to single target attributes and
multiple target attributes separately, where single target attributes adjustments strategies include: adjusting the parameters of classifiers, adjusting the number of classes
of target attributes and adjusting the public attributes. And multiple target attributes
adjustment utilizes the outputs of first prediction as the inputs of second prediction to
improve the accuracy of the first prediction. The framework proposed in this chapter
consumes less time compared with traditional dynamic modeling methods, and there
is no need to fully study the knowledge in various subjects for researchers using the
framework because of the regular patterns. In addition, adjustment strategies have
no restriction on the datasets; hence it will be useful universally. However, in some
cases, dynamic modeling has the advantage of precision, resulting in better accuracy, but the results from the framework proposed in the chapter could provide as a
comparison. In this work, a universal demographic attributes prediction framework is
proposed to work on a variety of dataset with Extreme Learning Machine (ELM). The
framework consists of three main components: First, processing raw data and extracting attribute features depending on different data types; Second, predicting desired
attributes by classification; Third, improving the accuracy of classifiers through various adjustment strategies. Two experiments of different data types on real world
prediction problems are conducted to demonstrate our framework can achieve better
performance than other traditional state-of-the-art prediction methods with respect
to accuracy. abstract environment
MemLog, an enhanced Lifelog annotation and search tool
As of very recently, we have observed a convergence of technologies that have led to the emergence of lifelogging as a potentially pervasive technology with many real-world use cases. While it is becoming easier to gather massive lifelog data archives with wearable cameras and sensors, there are still challenges in developing effective retrieval systems. One such challenge is in gathering annotations to support user access or machine learning tasks in an effective and efficient manner. In this work, we demonstrate a web-based annotation system for sensory and visual lifelog data and show it in operation on a large archive of nearly 1 million lifelog images and 27 semantic concepts in 4 categories